StreamKrimp: Detecting Change in Data Streams
نویسندگان
چکیده
Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream mining, as only limited storage can be used. In this paper we analyse this problem for streams of transaction data from an MDL perspective. Based on this analysis we introduce the STREAMKRIMP algorithm, which uses the KRIMP algorithm to characterise probability distributions with code tables. With these code tables, STREAMKRIMP partitions the stream into a sequence of substreams. Each switch of code table indicates a change in the underlying distribution. Experiments on both real and artificial streams show that STREAMKRIMP detects the changes while using only a very limited amount of data storage.
منابع مشابه
Detecting the Change of Clustering Structure in Categorical Data Streams
Analyzing clustering structures in data streams can provide critical information for making decision in realtime. Most research has been focused on clustering algorithms for data streams. We argue that, more importantly, we need to monitor the change of clustering structure online. In this paper, we present a framework for detecting the change of critical clustering structure in categorical dat...
متن کاملDetecting Changes in Unlabeled Data Streams Using Martingale
The martingale framework for detecting changes in data stream, currently only applicable to labeled data, is extended here to unlabeled data using clustering concept. The one-pass incremental changedetection algorithm (i) does not require a sliding window on the data stream, (ii) does not require monitoring the performance of the clustering algorithm as data points are streaming, and (iii) work...
متن کاملA PCA-Based Change Detection Framework for Multidimensional Data Streams
Detecting changes in multidimensional data streams is an important and challenging task. In unsupervised change detection, changes are usually detected by comparing the distribution in a current (test) window with a reference window. It is thus essential to design divergence metrics and density estimators for comparing the data distributions, which are mostly done for univariate data. Detecting...
متن کاملDetecting Changes in Twitter Streams using Temporal Clusters of Hashtags
Detecting events from social media data has important applications in public security, political issues, and public health. Many studies have focused on detecting specific or unspecific events from Twitter streams. However, not much attention has been paid to detecting changes, and their impact, in online conversations related to an event. We propose methods for detecting such changes, using cl...
متن کاملInfluence of Stream channel morphology and in-stream habitats on fish community in Golestan province Streams
Four streams with different sizes were selected for studying the effects of environmental factors on fish assemblages using indirect (Detrended Correspondence Analysis, DCA) and direct (Redundancy Analysis, RDA) gradient analysis in Golestan province. DCA of presence-absence and relative abundance data showed well gradient and linear model of species variability. In the within-site RDA, environ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008